{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 处理CSV文件笔记（第14章）  \n",
    "本文主要在python下介绍CSV文件，CSV 表示“Comma-Separated Values（逗号分隔的值）”，CSV文件是简化的电子表格，保存为纯文本文件。CSV 文件中的每行代表电子表格中的一行，逗号分割了该行中的单元格。Python 的csv模块让解析CSV 文件变得容易。CSV模块为Python自带库。常用函数如下:\n",
    "\n",
    "|函数|用途|备注|\n",
    "|:-:|:-:|:-:|\n",
    "|exampleFile = open(path)|打开文件，返回file文件|非csv模块中的函数，但可以用于打开csv文件|\n",
    "|csv.reader(exampleFile)|将file文件转换为一个Reader对象|不能直接将文件名字符串传递给csv.reader()函数|\n",
    "|exampleData = list(exampleReader)|在Reader 对象上应用list()函数，将返回一个csv文件内容列表|非csv模块中的函数|\n",
    "|outputFile = open('output.csv', 'w', newline='')|open()并传入'w'，以写模式打开一个文件|如果忘记设置newline关键字参数为空字符，output.csv中的行距将有两倍|\n",
    "|outputWriter.writerow[lists]|将lists写入csv文件中||\n",
    "|csv.writer(csvFile, delimiter='\\t')|将csv文件中的分隔符改为'\\t'||\n",
    "|csv.writer(csvFile, lineterminator='\\n\\n')|将csv文件中的行终止字符改为'\\n\\n'||"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 项目练习"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 项目：从CSV 文件中删除表头  \n",
    "读取当前工作目录中所有扩展名为.csv 的文件，除掉第一行的内容重新写入同名的文件。用新的、无表头的内容替换CSV 文件的旧内容。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Removing header from example.csv...\n"
     ]
    }
   ],
   "source": [
    "import csv\n",
    "import os\n",
    "\n",
    "\n",
    "# 创建文件夹，exist_ok=True表示文件夹如果存在则不报错\n",
    "os.makedirs('headerRemoved', exist_ok=True)\n",
    "# Loop through every file in the current working directory.\n",
    "# 查找本地所有文件\n",
    "for csvFilename in os.listdir('.'):\n",
    "    if not csvFilename.endswith('.csv'):\n",
    "         # skip non-csv files 跳过不是csv文件\n",
    "        continue\n",
    "    print('Removing header from ' + csvFilename + '...')\n",
    "\n",
    "    # Read the CSV file in (skipping first row). 读取文件跳过第一行\n",
    "    csvRows = []\n",
    "    csvFileObj = open(csvFilename)\n",
    "    readerObj = csv.reader(csvFileObj)\n",
    "    # 读取每一行\n",
    "    for row in readerObj:\n",
    "        # 跳过第一行\n",
    "        # readerObj.line_num 表示行号从1开始\n",
    "        if readerObj.line_num == 1:\n",
    "                 # skip first row\n",
    "            continue\n",
    "        # 保存数据\n",
    "        csvRows.append(row)\n",
    "    csvFileObj.close()\n",
    "\n",
    "    # Write out the CSV file. 写文件\n",
    "    csvFileObj = open(os.path.join(\n",
    "        'headerRemoved', csvFilename), 'w', newline='')\n",
    "    csvWriter = csv.writer(csvFileObj)\n",
    "    for row in csvRows:\n",
    "        csvWriter.writerow(row)\n",
    "    csvFileObj.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Excel 到CSV 的转换程序  \n",
    "将多个excel文件保存csv文件。一个Excel 文件可能包含多个工作表，必须为每个表创建一个CSV 文件。CSV文件的文件名应该是<Excel 文件名>_<表标题>.csv，其中<Excel 文件名>是没有扩展名的Excel 文件名（例如'spam_data'，而不是'spam_data.xlsx'），<表标题>是Worksheet 对象的title 变量中的字符串。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "current file is: spreadsheet-A_Sheet.csv\n",
      "current file is: spreadsheet-B_Sheet.csv\n",
      "current file is: spreadsheet-C_Sheet.csv\n",
      "current file is: spreadsheet-D_Sheet.csv\n",
      "current file is: spreadsheet-E_Sheet.csv\n",
      "current file is: spreadsheet-F_Sheet.csv\n",
      "current file is: spreadsheet-G_Sheet.csv\n",
      "current file is: spreadsheet-H_Sheet.csv\n",
      "current file is: spreadsheet-I_Sheet.csv\n",
      "current file is: spreadsheet-J_Sheet.csv\n",
      "current file is: spreadsheet-K_Sheet.csv\n",
      "current file is: spreadsheet-L_Sheet.csv\n",
      "current file is: spreadsheet-M_Sheet.csv\n",
      "current file is: spreadsheet-N_Sheet.csv\n",
      "current file is: spreadsheet-O_Sheet.csv\n",
      "current file is: spreadsheet-P_Sheet.csv\n",
      "current file is: spreadsheet-Q_Sheet.csv\n",
      "current file is: spreadsheet-R_Sheet.csv\n",
      "current file is: spreadsheet-S_Sheet.csv\n",
      "current file is: spreadsheet-T_Sheet.csv\n",
      "current file is: spreadsheet-U_Sheet.csv\n",
      "current file is: spreadsheet-V_Sheet.csv\n",
      "current file is: spreadsheet-W_Sheet.csv\n",
      "current file is: spreadsheet-X_Sheet.csv\n",
      "current file is: spreadsheet-Y_Sheet.csv\n",
      "current file is: spreadsheet-Z_Sheet.csv\n"
     ]
    }
   ],
   "source": [
    "import openpyxl\n",
    "import os\n",
    "import csv\n",
    "\n",
    "inputPath = './excelSpreadsheets'\n",
    "outputPath = './outputSheets'\n",
    "\n",
    "# 创建文件夹\n",
    "os.makedirs(outputPath, exist_ok=True)\n",
    "for excelFile in os.listdir(inputPath):\n",
    "    # Skip non-xlsx files, load the workbook object.\n",
    "    # 跳过不是xlsx的文件\n",
    "    if not excelFile.endswith('xlsx'):\n",
    "        continue\n",
    "    # 输入文件\n",
    "    inputFilePath = os.path.join(inputPath, excelFile)\n",
    "    # 打开xlsx文件\n",
    "    wb = openpyxl.load_workbook(inputFilePath)\n",
    "    # 获得当前文件sheetName\n",
    "    for sheetName in wb.sheetnames:\n",
    "        # 设置文件\n",
    "        csvFileName = excelFile.split('.')[0]+'_'+sheetName+'.csv'\n",
    "        csvFile = open(os.path.join(outputPath, csvFileName), 'w', newline='')\n",
    "        print(\"current file is: {}\".format(csvFileName))\n",
    "        # 写csv文件\n",
    "        outputWriter = csv.writer(csvFile)\n",
    "        sheet = wb[sheetName]\n",
    "\n",
    "        # 遍历每一行数据\n",
    "        for rowNum in range(1, sheet.max_row+1):\n",
    "            # 保存每一行数据\n",
    "            rowData = []\n",
    "            for colNum in range(1, sheet.max_column+1):\n",
    "                # 保存每一列数据\n",
    "                rowData.append(sheet.cell(row=rowNum, column=colNum).value)\n",
    "            # 写入一行数据\n",
    "            outputWriter.writerow(rowData)\n",
    "        csvFile.close()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
